In [1]:
from __future__ import division, print_function, unicode_literals

%matplotlib inline

import os

import IPython.display
import numpy as np
import requests
import requests_oauthlib
import oauthlib
import arrow
import BeautifulSoup

import json_io
import yaml_io
import utilities

import twitter

Twitter Search

The following set of links to Twitter's documentation are those I found most useful:

The page Help with the Search API has this helpful tidbit of information when you expect a large number of return tweets. In this case it is important to pay attention to iterating through the results:

Iterating in a result set: parameters such count, until, since_id, max_id allow to control how we iterate through search results, since it could be a large set of tweets. The 'Working with Timelines' documentation is a very rich and illustrative tutorial to learn how to use these parameters to achieve the best efficiency and reliability when processing result sets.

API Status

I have written a module named twitter.py which contains useful functions and classes based on what I learned with the previous notebook. One of the first capabilities I added was a function to generate a session object from the requests package, authorized via OAuth-2.

The cell below demonstrates querying the Twitter API for information on my account's current rate limit status.


In [8]:
session = utilities.authenticate()

print('\nclient_id: {:s}'.format(session.client_id.client_id))

info = utilities.rate_limit_from_api(session)

print('\nRate Status')
print('-----------')
print('Limit:     {:d}'.format(info['limit']))
print('Remaining: {:d}'.format(info['remaining']))

delta = arrow.get(info['reset']) - arrow.now()
seconds = delta.total_seconds()
minutes = seconds / 60.

print('Reset:    {:02.0f}:{:4.1f}'.format(minutes, seconds-int(minutes)*60.))


client_id: JxUV7dXAvXigyxyWafOGUA

Rate Status
-----------
Limit:     450
Remaining: 450
Reset:    15:59.5

Practice Search

Another important capabiity in twitter.py is the ability to search for Tweets matching a specified text pattern. The primary interface to search is though the class Tweet_Search. Calling the method run() on an instance returns a generator allowing for efficient retrieval of matching tweets.

The cell below shows a simple example. The query "grey hound dog" gets relatively few hits, about once per day or so. This is nice for testing as it. If I comment out this query and try again with something like the title of a current popular movie, I will receive many tens of thousands of tweets. This is when it became clear to me that I need another layer to manage larger volumes of tweets.


In [7]:
query = 'grey hound dog'
# query = 'hobbit desolation smaug'

# Output folder.  Will be created if it oes not already exist.
path_example = os.path.join(os.path.curdir, 'tweets_testing_one')
    
# Build a search object that knows how to talk to Twitter's API.
searcher = twitter.Search(session)

# Run a search for a specific query string, operates as a generator.
gen = searcher.run(query)

# Loop over returned results.
for k, tw in enumerate(gen):
    print('\n{:3d} | {:s} | {:s}'.format(k, str(tw.timestamp), tw.text))

    # Save Tweet to file.
    tw.serialize(path_example)

    # Stop the search if it goes on for too long.
    if k > 250:
        raise StopIteration


---------------------------------------------------------------------------
APILimitError                             Traceback (most recent call last)
<ipython-input-7-253e30a12658> in <module>()
     12 
     13 # Loop over returned results.
---> 14 for k, tw in enumerate(gen):
     15     print('\n{:3d} | {:s} | {:s}'.format(k, str(tw.timestamp), tw.text))
     16 

/home/pierre/Projects/HackingForMovieTrends/twitter.py in run(self, query, since_id, max_id, result_type, lang)
    562                 self.save_config()
    563 
--> 564                 raise errors.APILimitError('API limit exceeded: {:s}'.format(msg))
    565 
    566             if len(tweets) == 0:

APILimitError: API limit exceeded: Rate limit exceeded

Pretty

The Tweet class will also produce nicely-formatted text from the internal JSON data.


In [4]:
print(tw)


Tweet Object:
{
    "contributors": null,
    "coordinates": null,
    "created_at": "Thu Dec 26 15:03:01 +0000 2013",
    "entities": {
        "hashtags": [],
        "symbols": [],
        "urls": [
            {
                "display_url": "ask.fm/a/a5biabm9",
                "expanded_url": "http://ask.fm/a/a5biabm9",
                "indices": [
                    116,
                    138
                ],
                "url": "http://t.co/N1CLq7YESK"
            }
        ],
        "user_mentions": []
    },
    "favorite_count": 0,
    "favorited": false,
    "geo": null,
    "id": 416222626923954176,
    "id_str": "416222626923954176",
    "in_reply_to_screen_name": null,
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_str": null,
    "lang": "en",
    "metadata": {
        "iso_language_code": "en",
        "result_type": "recent"
    },
    "place": null,
    "possibly_sensitive": false,
    "retweet_count": 0,
    "retweeted": false,
    "source": "<a href=\"http://ask.fm/\" rel=\"nofollow\">Ask.fm</a>",
    "text": "What is your favorite dog breed? \u2014 the italian grey hound.I fell in love when i first watched a documentary of t... http://t.co/N1CLq7YESK",
    "truncated": false,
    "user": {
        "contributors_enabled": false,
        "created_at": "Thu Dec 03 11:03:54 +0000 2009",
        "default_profile": false,
        "default_profile_image": false,
        "description": "Just so you know : I love happy endings.\r\nIsaiah 55:8-11",
        "entities": {
            "description": {
                "urls": []
            }
        },
        "favourites_count": 349,
        "follow_request_sent": null,
        "followers_count": 149,
        "following": null,
        "friends_count": 278,
        "geo_enabled": false,
        "id": 94306100,
        "id_str": "94306100",
        "is_translator": false,
        "lang": "en",
        "listed_count": 0,
        "location": "Malaysia",
        "name": "chloe cheong jo anne",
        "notifications": null,
        "profile_background_color": "ACDED6",
        "profile_background_image_url": "http://a0.twimg.com/profile_background_images/378800000155200473/FNPqfPba.jpeg",
        "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/378800000155200473/FNPqfPba.jpeg",
        "profile_background_tile": false,
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/94306100/1388041696",
        "profile_image_url": "http://pbs.twimg.com/profile_images/413839833308471296/3jDWV9vr_normal.jpeg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/413839833308471296/3jDWV9vr_normal.jpeg",
        "profile_link_color": "038543",
        "profile_sidebar_border_color": "000000",
        "profile_sidebar_fill_color": "F6F6F6",
        "profile_text_color": "333333",
        "profile_use_background_image": true,
        "protected": false,
        "screen_name": "ChloeCheong13",
        "statuses_count": 1719,
        "time_zone": "Pacific Time (US & Canada)",
        "url": null,
        "utc_offset": -28800,
        "verified": false
    }
}

Tweet Manager

A class to manage a large collection of Tweets. Make a new manager for the Tweets I just created in the cell above.


In [5]:
mgr_tweets = twitter.Tweet_Manager(path_example)

print(mgr_tweets.count)

print(mgr_tweets.min_id)
print(mgr_tweets.max_id)

print(mgr_tweets.min_timestamp)
print(mgr_tweets.max_timestamp)


15
416222626923954176
418470305263542272
2013-12-26T15:03:01+00:00
2014-01-01T19:54:30+00:00

Combined Forces

The previous search can be simplified using a Tweet_Manager to help with paths and file names.


In [6]:
# Output folder.
path_example = os.path.join(os.path.curdir, 'tweets_testing_two')
    
searcher = twitter.Search(session)

mgr_tweets = twitter.Tweet_Manager(path_example)

# Run a search for a specific query string, operates as a generator.
gen = searcher.run(query)

# Loop over returned results.
for k, tw in enumerate(gen):
    print('\n{:3d} | {:s} | {:s}'.format(k, str(tw.timestamp), tw.text))

    mgr_tweets.add_tweet_obj(tw)

    # Stop the search if it goes on for too long.
    if k > 250:
        raise StopIteration
        
        
print()
print(mgr_tweets.count)

print(mgr_tweets.min_id)
print(mgr_tweets.max_id)

print(mgr_tweets.min_timestamp)
print(mgr_tweets.max_timestamp)


  0 | 2014-01-01T19:54:30+00:00 | Saw the best thing ever this year, this morning! 

Out with Charlie dog, we met this couple who had a grey hound... 

  1 | 2014-01-01T12:32:57+00:00 | when is a black dog not a black dog? when it's a grey--hound!

  2 | 2014-01-01T08:54:37+00:00 | caravan hound into grey hound puppies available: Hi this is hunain i have caravan hound into gr... 

  3 | 2014-01-01T06:56:51+00:00 | Dog owners: My 13 yr old german shep/grey hound is limping all of a sudden and welped in pain just laying there. Can we give her anything?

  4 | 2013-12-31T07:00:30+00:00 | RT @LarryHoover__: Kamal look like a grey hound dog

  5 | 2013-12-31T06:57:43+00:00 | Kamal look like a grey hound dog

  6 | 2013-12-29T23:28:47+00:00 | RT @iSpankHim: Alex would be an Italian grey hound if he was a dog

  7 | 2013-12-29T22:39:59+00:00 | Alex would be an Italian grey hound if he was a dog

  8 | 2013-12-29T18:31:56+00:00 | I will have a weiner dog, a husky, and a grey hound when I get older. 😍😍😍

  9 | 2013-12-29T13:48:59+00:00 | Are back door was open and a stray dog just walked into are house he is a grey hound puppy:( collar  no address...RSPCA time :) #puppy#help

 10 | 2013-12-29T08:16:22+00:00 | "And over here is where you saw a grey hound dog" 
#ghostadventures  @Zak_Bagans @agoodwincollect @NickGroff_

 11 | 2013-12-28T12:13:35+00:00 | @2coolKittypurry the only big dog ive been bitten by is a grey hound

 12 | 2013-12-28T04:11:20+00:00 | that blue grey coat on a dog like a pure grey hound has is a pretty coat, some terriers have it too

 13 | 2013-12-27T23:10:37+00:00 | Im Threw Fuckin Wit T Not She Said The Girl Look Like A Grey Hound Dog!!

 14 | 2013-12-26T15:03:01+00:00 | What is your favorite dog breed? — the italian grey hound.I fell in love when i first watched a documentary of t... 

15
416222626923954176
418470305263542272
2013-12-26T15:03:01+00:00
2014-01-01T19:54:30+00:00

Iterate over Tweets

Iterate over all example Tweets via generator and print out some interesting details.

Warning: don't run the next few cells if the example search above returned a really large number of Tweets...


In [7]:
for tw in mgr_tweets.tweets:
    print('{:d} | {:<10s} | {:s}'.format(tw.id_int, tw.source, tw.source_full))
#     print('{:d} | filter: {:5s}'.format(tw.id_int, str(twitter.filter(tw))))
#     print('{:d} --> {:s}'.format(tw.id_int, tw.text))
#     print('{:d} | {:s} | {:5s} | {:2d}'.format(tw.id_int, str(tw.timestamp), str(tw.retweet), tw.retweet_count))


416222626923954176 | unknown    | <a href="http://ask.fm/" rel="nofollow">Ask.fm</a>
416707722974085121 | txt        | <a href="http://twitter.com/devices" rel="nofollow">txt</a>
416783400435847168 | web        | web
416904761775255552 | iphone     | <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
417207453017583616 | android    | <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
417291156096307200 | blackberry | <a href="http://blackberry.com/twitter" rel="nofollow">Twitter for BlackBerry®</a>
417362362862034944 | iphone     | <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
417424789511213056 | iphone     | <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
417437066792685568 | android    | <a href="https://twitter.com/download/android" rel="nofollow">Twitter for  Android</a>
417912432867422209 | iphone     | <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
417913133689102336 | iphone     | <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
418274603287334912 | android    | <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
418304240847568896 | unknown    | <a href="http://twitterfeed.com" rel="nofollow">twitterfeed</a>
418359188163538944 | web        | web
418470305263542272 | unknown    | <a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>

Search Manager

Next up is a Search_Manager to help search for new Tweets to add to a new or existing collection desribed by a Tweet_Manager. So far I have an easy way to serialize a Tweet to a .json file. But I may need to restart a search that was interrupted or if I hit the rate limit. I will also want to refresh a given search sometime in the future. Search_Manager is implemented as a subclass of Tweet_Manager and makes direct use of the Tweet_Search class.


In [13]:
path_example = os.path.join(os.path.curdir, 'tweets_testing_three')

query = 'grey hound dog'
# query = 'happy dog'
# query = 'hobbit desolation smaug'

manager = twitter.Search_Manager(session, query, path_example)

manager.search()

print(manager.count)
print(manager.min_timestamp)
print(manager.max_timestamp)
print(manager.api_remaining)


16
2013-12-26T15:03:01+00:00
2014-01-01T19:54:30+00:00
151

In [12]:
path_query = os.path.join(os.path.curdir, 'tweets')

# query = 'hobbit desolation smaug'
# query = 'Anchorman 2 The Legend Continues'
query = 'Anchorman'

manager = twitter.Search_Manager(session, query, path_query)

manager.search_continuous()

print(manager.api_remaining)


count: 162254
User interupt!
count: 163100
154

In [12]:


In [54]:


In [ ]:


In [ ]:


In [8]:


In [8]: